We can generate fake data based on the assumption that a variable follows a certain distribution.
- We randomly sample observations from the distribution.
age <- runif(1000, min = 15, max = 75)
Since there is randomness involved, we will get a different result each time we run the code.
runif(3, min = 15, max = 75)
[1] 31.73359 31.90095 42.28072
runif(3, min = 15, max = 75)
[1] 54.75875 54.92569 32.84388
To make a reproducible random sample, we first set the seed:
set.seed(94301)
runif(3, min = 15, max = 75)
[1] 59.68333 30.27768 38.29962
set.seed(94301)
runif(3, min = 15, max = 75)
[1] 59.68333 30.27768 38.29962
set.seed(435)
fake_data <- tibble(names = charlatan::ch_name(1000),
height = rnorm(1000, mean = 67, sd = 3),
age = runif(1000, min = 15, max = 75),
measure = rbinom(1000, size = 1, prob = 0.6)) |>
mutate(supports_measure_A = ifelse(measure == 1, "yes", "no"))
head(fake_data)
# A tibble: 6 × 5
names height age measure supports_measure_A
<chr> <dbl> <dbl> <int> <chr>
1 Elbridge Kautzer 67.4 66.3 1 yes
2 Brandon King 65.0 61.5 0 no
3 Phyllis Thompson 68.1 53.8 1 yes
4 Humberto Corwin 67.5 33.9 1 yes
5 Theresia Koelpin 71.4 16.1 1 yes
6 Hayden O'Reilly-Johns 66.2 37.0 0 no
Check to see the ages look uniformly distributed.
Code
fake_data |>
ggplot(aes(y = supports_measure_A,
x = age,
fill = supports_measure_A)) +
ggridges::geom_density_ridges(show.legend = F) +
scale_fill_brewer(palette = "Paired") +
theme_bw() +
labs(x = "Age (years)",
y = "",
subtitle = "Support for Measure A",)